Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow passing in a jar timestamp, sort entries by name #58

Closed
wants to merge 1 commit into from

Conversation

raboof
Copy link
Contributor

@raboof raboof commented Jul 15, 2017

This is useful for achieving reproducible builds, where building the same sources with the same settings should result in the same artifact.

Related to sbt/zinc#333

@jvican
Copy link
Member

jvican commented Jul 16, 2017

Hey @raboof, thanks for doing this! Let me get back to it in a few days, right now we're preparing RC.

@dwijnand
Copy link
Member

Hey @raboof, do you have any idea what the possible implications of dropping the timestamps are?

Perhaps as alternative to zeroing them out we should introduce one or more parameters so that it becomes a client's decision whether to use currentTimeMillis/File.lastModified or not. WDYT?

@raboof
Copy link
Contributor Author

raboof commented Jul 17, 2017

@dwijnand I'm not aware of possible implications, and I didn't see any warnings on the reproducible builds maven plugin which also clears the timestamps.

In most cases these timestamps represent the 'time of last compilation' or 'time of packaging', I'm not sure when that'd be interesting information. Also I'm not sure this information is even exposed via the classpath API.

I'd be OK with making this behavior configurable (though of course I'd like the default to be 'cleared' ;) ). The question is how/where to configure this and how to pass that information around. I can give it a shot.

@jvican
Copy link
Member

jvican commented Jul 17, 2017

I still don't have time to look into this, but my gut feeling is that this could have an implication in Zinc to detect changes in jars as well as scalac (to avoid caching the classpath entries). Need to look deeper, but just throwing some observations out there...

@dwijnand
Copy link
Member

Yeah, that's my concern too.

@typesafe-tools
Copy link

dbuild has checked the following projects against Scala 2.12:

Project Reference Commit
sbt 1.0.x sbt/sbt@2d7ec47
zinc 1.0 sbt/zinc@d4d29e8
io pull/58/head b75de5a
librarymanagement 1.0 sbt/librarymanagement@0147e0c
util 1.0 sbt/util@81277cb
website 1.0.x sbt/website@c5ef5f0

❌ The result is: FAILED
(restart)

@raboof
Copy link
Contributor Author

raboof commented Jul 17, 2017

Aah, those kind of tools could be impacted, yes... that might be hard to deal with. Let's see if we can find an example.

@dwijnand dwijnand closed this Jul 17, 2017
@dwijnand dwijnand changed the base branch from 1.0 to 1.x July 17, 2017 15:09
@dwijnand dwijnand reopened this Jul 17, 2017
@raboof raboof changed the title Get closer to repeatable builds by removing timestamps from jars [not for merge] get closer to repeatable builds by removing timestamps from jars Jul 20, 2017
@jvican
Copy link
Member

jvican commented Jul 20, 2017

The thing I'm most worried about is that today I learned that the JVM caches jars based on the timestamp (scala/bug#10295); which conflicts with this change... if a jar changes completely, and the JVM is caching the jars for the same compiler run, a reference to the previous jar could be maintained and that would cause compilation errors.

However, I'm not sure how all these pieces interact with each other. I need some kind of first hand experience because I can certainly ensure that will be the result.

@raboof I really appreciate you took this thing up. Would you mind creating a sbt plugin that overrides the sbt task creating a jar and nulls out the last modified time and experiment with it? I think that could give us great insights.

@raboof
Copy link
Contributor Author

raboof commented Jul 20, 2017

Interesting find! scala/bug#10295 (comment) is talking about java.util.File though - might that refer to the filesystem timestamp rather than the timestamp inside of the Jar (which this PR is about)?

@raboof
Copy link
Contributor Author

raboof commented Jul 20, 2017

@jvican It looks like IO.jar() is only called from the Package task, but I'm not sure it'd be possible/easy to override that task... could you point me in the right direction?

@jvican
Copy link
Member

jvican commented Jul 24, 2017

Right, so this only affects the files inside the zip files. Then I'm less scared. But I'm still positive that testing it should give us more confidence 😄 @raboof.

To override this in sbt, have a look at the implementation of packageBin, which is the responsible for packaging jars IIRC.

Let's check that we have project A, and project B, which depends on A. To check that this patch doesn't do anything dangerous (like affecting incremental compilation), I would override packageBin with the sbt syntax and reuse the return type, which is the jar (zip) file. It would be something similar to:

// remember to scope it in A
packageBin := {
  val packaged = packageBin.value
  // Write any timestamps inside the `packaged` file to zero
  // Do more stuff to make the jar reproducible
  ???
}

Following a similar methodology, you can instrument any task of sbt (which is one of the best features of the tool). If it's not by reusing the return type, it's by copy-pasting the original implementation defined in Defaults.scala and then modifying the bits you're interested in.

Let me know how it goes.

@raboof raboof changed the title [not for merge] get closer to repeatable builds by removing timestamps from jars [not for merge] get closer to repeatable builds by removing in-jar timestamps Aug 5, 2017
@raboof
Copy link
Contributor Author

raboof commented Aug 5, 2017

Hmm https://github.com/raboof/scala-reproducible-build-sbt-library-dependency doesn't appear to work even with the timestamps intact 🤔

@jvican
Copy link
Member

jvican commented Aug 5, 2017

What do you mean by "doesn't appear to work" @raboof?

@raboof
Copy link
Contributor Author

raboof commented Aug 5, 2017

Sorry that was not too helpful :).

When I sbt publishLocal test-lib and then start an interactive sbt session in project-using-test-lib and run, that works. When I then change the code in test-lib and sbt publishLocal it, runing the project-using-test-lib again from the (still running) interactive session I get [error] error while loading Constants, class file '/home/aengelen/.ivy2/local/net.bzzt/test-lib_2.12/0.1-SNAPSHOT/jars/test-lib_2.12.jar(net/bzzt/Constants.class)' is broken.

Did we expect this to work? If we didn't expect that to work, can you help come up with examples of things we do expect to work (and want to test it keeps working when we strip timestamps)?

@jvican
Copy link
Member

jvican commented Aug 10, 2017

But you're using exportJars right? I was not expecting that to work. What about generating the jars with packageBin and then having a dependency on that? What you want to check is that a change produces a different jar (with the same filestamp) but this jar is detected by Zinc and recompilation is triggered.

Note: The error you're seeing is because Scalac caches jars when run from sbt or in resident mode...

@retronym
Copy link
Member

Here's a related Maven plugin that might serve as inspiration.

@raboof
Copy link
Contributor Author

raboof commented Jan 11, 2018

Right: there are basically 2 ways to make sure there are no in-jar timestamps in results: either not generating the timestamps in the first place, or post-processing the jars to remove the timestamps.

This PR works towards the former, while indeed the maven plugin and the sbt plugin I created on top of that post-process the jars to achieve the same.

For now the post-processing 'scratches my itch', but it would be nicer to solve this 'closer to the source'.

.. and sorting entries by name. This made building a minimal test
application 'repeatable' when building on the same machine but clearing
the target directory.

Related to sbt/zinc#333
@raboof
Copy link
Contributor Author

raboof commented Feb 8, 2018

I'm afraid I'd need more hand-holding to get this over the finish line, so closing this for now. If anyone wants to reopen/continue, feel free (and I'd be happy to help), but I don't feel qualified to push it forward on my own right now :).

@raboof raboof closed this Feb 8, 2018
@raboof raboof changed the title [not for merge] get closer to repeatable builds by removing in-jar timestamps allow passing in a jar timestamp, sort entries by name Nov 13, 2019
@raboof
Copy link
Contributor Author

raboof commented Nov 14, 2019

(continued in sbt/sbt#5233 and #279)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants